introduction

“Macroeconomic forecasts are often made by separating two objectives: providing the growth rate of the variable of interest (usually GDP) and detecting turning points in the economic cycle.” - Doz, Ferrara, and Pionnier, 2020

two models:
- linear regression: point estimates of future GDP
- random forest: using gdp and other variables to predict binary economic state

recession

“The NBER does not define a recession in terms of two consecutive quarters of decline in real GDP. Rather, a recession is a significant decline in economic activity spread across the economy, lasting more than a few months, normally visible in real GDP, real income, employment, industrial production, and wholesale-retail sales” - NBER, 2010

the random forest learns which months are contractionary as defined by NBER.

regression data datasets

random forest datasets

data sources

annualized gdp change forecast

looking ahead based on February 2020 data:

Month One Month Two Months Three Months
March 2020 6.27% 1.4% 2.62%
April 2020 NA 5.0% 2.02%
May 2020 NA NA 3.37%

shiny x gdp forecasting

dull side of things

lagged_predictors <- function(n_lags, predictor_df, n_predictors=10){
  pred_df <- sapply(1:n_predictors, function(x){
    lag(predictor_df[,x],n_lags) })
  pred_df <- as.data.frame(pred_df)
  return(pred_df)}
predictors <- readRDS("./models/predictors.RDS")
predictors_oml <- lagged_predictors(as.numeric(input$n_lags),predictors)
predictors_oml <- cbind(predictors$GDP_LOG, predictors_oml) %>% drop_na()
testIDs <- sample(1:dim(predictors_oml)[1], 
                  input$train_test_adjust*dim(predictors_oml)[1])
training_predictors_oml <- predictors_oml[testIDs,]
testing_predictors_oml <- predictors_oml[-testIDs,]
gdp_oml_lm <- lm(GDP_LOG ~., data = training_predictors_oml)
pred_gdp_log_oml <- predict.lm(gdp_oml_lm, 
                               newdata = testing_predictors_oml)

growing the amazon

trctrl <- trainControl(method = "repeatedcv", 
                       number = 10, repeats = 5, search = "grid")
mtry <- ncol(train_x)
ntrees <- 101
tunegrid <- expand.grid(.mtry = c(2:mtry))
metric <- "Accuracy"
cl <- makePSOCKcluster(5)
registerDoParallel(cl)
rf_recession <- train(x = train_x, y = train_y, method = "rf", 
                      metric = metric, trControl = trctrl, 
                      tuneGrid = tunegrid, ntree = ntrees)
stopCluster(cl)

beneath the canopy

left-split <= s; right-split >s where s is split

results

with a 70-30 training-test split:

confusion matrix | pred (v), ref (h)
N Y
N 120 1
Y 2 13
random forest performance
Precision Recall F1 Prevalence Accuracy
0.9917355 0.9836066 0.9876543 0.8970588 0.9779412

probabilistically speaking

pushing it just a little farther

confusion matrix | pred (v), ref (h)
N Y
N 120 3
Y 2 10
random forest performance
Precision Recall F1 Prevalence Accuracy
0.9756098 0.9836066 0.9795918 0.9037037 0.962963

probabilistically speaking again

bringing it to the present

Econ. State April 2020 May 2020
Prob. Expansion 86.14% 92.08%
Prob. Contraction 13.86% 7.92%

questions

source code: github.com/westerleyy/recession_predictR